Mapping the unknown: The spatially correlated multi-armed bandit
نویسندگان
چکیده
We introduce the spatially correlated multi-armed bandit as a task coupling function learning with the explorationexploitation trade-off. Participants interacted with bi-variate reward functions on a two-dimensional grid, with the goal of either gaining the largest average score or finding the largest payoff. By providing an opportunity to learn the underlying reward function through spatial correlations, we model to what extent people form beliefs about unexplored payoffs and how that guides search behavior. Participants adapted to assigned payoff conditions, performed better in smooth than in rough environments, and—surprisingly—sometimes performed equally well in short as in long search horizons. Our modeling results indicate a preference for local search options, which when accounted for, still suggests participants were best-described as forming local inferences about unexplored regions, combined with a search strategy that directly traded off between exploiting high expected rewards and exploring to reduce uncertainty about the spatial structure of rewards.
منابع مشابه
A quality assuring multi-armed bandit crowdsourcing mechanism with incentive compatible learning
We develop a novel multi-armed bandit (MAB) mechanism for the problem of selecting a subset of crowd workers to achieve an assured accuracy for each binary labelling task in a cost optimal way. This problem is challenging because workers have unknown qualities and strategic costs.
متن کاملGap-free Bounds for Stochastic Multi-Armed Bandit
We consider the stochastic multi-armed bandit problem with unknown horizon. We present a randomized decision strategy which is based on updating a probability distribution through a stochastic mirror descent type algorithm. We consider separately two assumptions: nonnegative losses or arbitrary losses with an exponential moment condition. We prove optimal (up to logarithmic factors) gap-free bo...
متن کاملAsymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint
We develop asymptotically optimal policies for the multi armed bandit (MAB), problem, under a cost constraint. This model is applicable in situations where each sample (or activation) from a population (bandit) incurs a known bandit dependent cost. Successive samples from each population are iid random variables with unknown distribution. The objective is to have a feasible policy for deciding ...
متن کاملDynamic Pricing under Finite Space Demand Uncertainty: A Multi-Armed Bandit with Dependent Arms
We consider a dynamic pricing problem under unknown demand models. In this problem a seller offers prices to a stream of customers and observes either success or failure in each sale attempt. The underlying demand model is unknown to the seller and can take one of N possible forms. In this paper, we show that this problem can be formulated as a multi-armed bandit with dependent arms. We propose...
متن کامل1 Asymptotic Bayes Analysis for the Finite Horizon One Armed Bandit Problem
The multi-armed bandit probem is often taken as a basic model for the tradeoff between the exploration utilization required for efficient optimization under uncertainty. In this paper we study the situation in which the unknown performance of a new bandit is to be evaluated and compared with that of a known one over a finite horizon. We assume that the bandits represent random variables with di...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017